library(tidyverse)
flexibility <- seq(1, 10, length.out = 100)
bias_squared <- (10 - flexibility)^2 / 100
variance <- flexibility^2 / 100
training_error <- bias_squared - (flexibility / 50) + 0.2
irreducible_error <- rep(0.2, length(flexibility))
test_error <- bias_squared + variance + irreducible_error
data <- data.frame(flexibility, bias_squared, variance,
training_error, test_error, irreducible_error)
library(reshape2)
data_melted <- melt(data, id.vars = 'flexibility')
ggplot(data_melted, aes(x = flexibility, y = value, color = variable)) +
geom_line() +
labs(x = 'Flexibility', y = 'Error', title = 'Bias-Variance Decomposition') +
theme_minimal() +
scale_color_discrete(name = "Curves", labels = c("Squared Bias", "Variance", "Training Error",
"Test Error", "Bayes (Irreducible) Error"))Biostat 212a Homework 1
Due Jan 23, 2024 @ 11:59PM
1 Filling gaps in lecture notes (10pts)
Consider the regression model \[ Y = f(X) + \epsilon, \] where \(\operatorname{E}(\epsilon) = 0\).
1.1 Optimal regression function
Show that the choice \[ f_{\text{opt}}(X) = \operatorname{E}(Y | X) \] minimizes the mean squared prediction error \[ \operatorname{E}\{[Y - f(X)]^2\}, \] where the expectations averages over variations in both \(X\) and \(Y\). (Hint: condition on \(X\).)
answer: \[ \operatorname{E}\{[Y - f(X)]^2|X=x\} \\ =\operatorname{E}[Y^2|X=x] - 2f(X)\operatorname{E}[Y|X=x]+f(x)^2 \\ \] taking derivative with respect to f(x) \[ -2\operatorname{E}[Y|X=x]+2f(x)=0 \\ 2f(x)=2\operatorname{E}[Y|X=x] \\ f(x)=\operatorname{E}[Y|X=x] \] Therefore, we found E[Y|X=x] is the minimizes the MSE.
1.2 Bias-variance trade-off
Given an estimate \(\hat f\) of \(f\), show that the test error at a \(x_0\) can be decomposed as \[ \operatorname{E}\{[y_0 - \hat f(x_0)]^2\} = \underbrace{\operatorname{Var}(\hat f(x_0)) + [\operatorname{Bias}(\hat f(x_0))]^2}_{\text{MSE of } \hat f(x_0) \text{ for estimating } f(x_0)} + \underbrace{\operatorname{Var}(\epsilon)}_{\text{irreducible}}, \] where the expectation averages over the variability in \(y_0\) and \(\hat f\).
answer: \[ \operatorname{E}\{[y_0 - \hat f(x_0)]^2\} \\ =\operatorname{E}\{[y_0 - f(x_0) + f(x_0) - \hat f(x_0)]^2\} \\ =\operatorname{E}\{[y_0 - f(x_0)]^2\} + \operatorname{E}\{[f(x_0) - \hat f(x_0)]^2\} + 2\operatorname{E}\{[y_0 - f(x_0)][f(x_0) - \hat f(x_0)]\} \\ \] Since E[\(\epsilon\)] = 0 and Y0 = f(x0) + \(\epsilon\) ,we have \[ \operatorname{E}\{[y_0 - f(x_0)]^2\} \\ =\operatorname{E}\{[f(x_0) + \epsilon - f(x_0)]^2\} \\ =\operatorname{E}\{[\epsilon]^2\} - (\operatorname{E}[\epsilon])^2 +(E[\epsilon])^2 \\ =\operatorname{Var}(\epsilon) \\ \] Since Y0-f(x0) = \(\epsilon\), E[\(\epsilon\)] = 0 \[ 2(f(x_0) - \hat f(x_0))\operatorname{E}\{[y_0 - f(x_0)]\} \\ =2(f(x_0) - \hat f(x_0))\operatorname{E}\{[\epsilon]\} \\ =0\\ \operatorname{E}\{[y_0 - \hat f(x_0)]^2\} \\ =\operatorname{E}[(f(x_0) - \hat f(x_0))^2] + \operatorname{E}(\hat f(x_0))-\hat f(x_0)^2 \\ =\operatorname{E}[f(x_0)-E[\hat f(x_0)]]^2 + E[\hat f(x_0)]-\hat f(x_0)]]^2 \\ =Bias^2[\hat f(x_0)] + Var[\hat f(x_0)] \] Combine all of them, we can got: \[ \operatorname{E}\{[y_0 - \hat f(x_0)]^2\} = Var(\hat f(x_0)) + Bias^2(\hat f(x_0)) + Var(\epsilon) \]
2 ISL Exercise 2.4.3 (10pts)
- Test error decreases initially but rises as flexibility leads to overfitting.Test error decreases initially but rises as flexibility leads to overfitting. Bias decreases, improving problem representation, while variance increases, especially at higher flexibility, reducing model robustness. This highlights the challenge in balancing model complexity for optimal performance.
3 ISL Exercise 2.4.4 (10pts)
- Medical Diagnosis: Response: Diagnosis (e.g., disease present or not). Predictors: Patient symptoms, lab test results, demographic data (age, gender), medical history. Goal: The goal is primarily prediction. The emphasis is on accurately predicting whether a patient has a specific disease or condition based on their symptoms and test results. While inference can be valuable for understanding which factors are most predictive of certain diseases, the immediate utility is in the accurate and efficient prediction of the disease for treatment decisions.
- Credit Scoring in Finance: Response: Creditworthiness (e.g., high or low credit risk). Predictors: Credit history, current debts, income, employment status, past loan repayment history, credit score. Goal: This application leans towards prediction. Financial institutions use these models to predict the likelihood that an individual will repay a loan. Understanding the factors that influence credit risk is important, but the primary objective is to predict an individual’s credit risk to make lending decisions.
- Customer Churn Prediction in Business: Response: Churn (e.g., whether a customer will stop using a company’s products/services). Predictors: Customer interaction history, purchase history, customer service records, demographic data, usage patterns. Goal: Again, the goal is prediction. Companies use these models to predict which customers are at risk of leaving so that they can take proactive measures to retain them. While inference might help understand why customers churn, the direct aim is to predict churn to implement retention strategies.
- Real Estate Pricing: Response: House price. Predictors: Size (square footage), location, number of bedrooms, age of the house, proximity to amenities, etc. Goal: This application is primarily for prediction. The focus is on predicting the price of a house based on various features, which is crucial for buyers, sellers, and real estate agents.
- Weather Forecasting: Response: Temperature. Predictors: Humidity, atmospheric pressure, wind speed, historical temperature data, time of the year, etc. Goal: The goal is prediction. Accurate temperature forecasts based on current and historical weather data are vital for a range of activities, from agriculture to daily planning.
- Educational Outcomes: Response: Student academic performance (e.g., grades or test scores). Predictors: Study hours, attendance, parental education level, socioeconomic status, previous academic records, etc. Goal: This can be for both inference and prediction. While predicting student performance is valuable, understanding the impact of various factors (like study hours or socioeconomic status) on academic outcomes is also crucial for educational policy and interventions.
- Market Segmentation: Objective: To categorize customers into different segments based on their purchasing behavior, preferences, demographic characteristics, etc. Application: Companies can use cluster analysis to identify distinct groups within their customer base. This helps in tailoring marketing strategies, developing targeted products, and improving customer service by understanding the specific needs and preferences of each segment.
- Genomic Data Classification in Biology: Objective: To classify genetic data for identifying patterns and similarities in DNA sequences. Application: In biological research, cluster analysis is employed to group genes with similar expression patterns, which can be indicative of shared functions or regulatory mechanisms. This is crucial in understanding genetic diseases, evolutionary biology, and the development of targeted treatments.
- Document Classification: Objective: To organize and categorize large sets of digital documents based on their content and thematic similarities. Application: This is particularly useful in digital libraries, online research databases, and for information retrieval systems. By clustering documents, these systems can enhance search accuracy, improve the organization of information, and enable users to discover related content more effectively.
4 ISL Exercise 2.4.10 (30pts)
Your can read in the boston data set directly from url https://raw.githubusercontent.com/ucla-biostat-212a/2024winter/master/slides/data/Boston.csv. A documentation of the boston data set is here.
4.0.1 R
library(tidyverse)
Boston <- read_csv("https://raw.githubusercontent.com/ucla-biostat-212a/2024winter/master/slides/data/Boston.csv",
col_select = -1) %>%
print(width = Inf)# A tibble: 506 × 13
crim zn indus chas nox rm age dis rad tax ptratio lstat
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 0.00632 18 2.31 0 0.538 6.58 65.2 4.09 1 296 15.3 4.98
2 0.0273 0 7.07 0 0.469 6.42 78.9 4.97 2 242 17.8 9.14
3 0.0273 0 7.07 0 0.469 7.18 61.1 4.97 2 242 17.8 4.03
4 0.0324 0 2.18 0 0.458 7.00 45.8 6.06 3 222 18.7 2.94
5 0.0690 0 2.18 0 0.458 7.15 54.2 6.06 3 222 18.7 5.33
6 0.0298 0 2.18 0 0.458 6.43 58.7 6.06 3 222 18.7 5.21
7 0.0883 12.5 7.87 0 0.524 6.01 66.6 5.56 5 311 15.2 12.4
8 0.145 12.5 7.87 0 0.524 6.17 96.1 5.95 5 311 15.2 19.2
9 0.211 12.5 7.87 0 0.524 5.63 100 6.08 5 311 15.2 29.9
10 0.170 12.5 7.87 0 0.524 6.00 85.9 6.59 5 311 15.2 17.1
medv
<dbl>
1 24
2 21.6
3 34.7
4 33.4
5 36.2
6 28.7
7 22.9
8 27.1
9 16.5
10 18.9
# ℹ 496 more rows
answer: There is 506 rows and 13 columns in the data set. Each row represent the set of predictor obeservations for a given Neighborhood in Boston. Each column represent each predictor variable for which an observation was made in 506 neighborhoods of Boston.
str(Boston)tibble [506 × 13] (S3: tbl_df/tbl/data.frame)
$ crim : num [1:506] 0.00632 0.02731 0.02729 0.03237 0.06905 ...
$ zn : num [1:506] 18 0 0 0 0 0 12.5 12.5 12.5 12.5 ...
$ indus : num [1:506] 2.31 7.07 7.07 2.18 2.18 2.18 7.87 7.87 7.87 7.87 ...
$ chas : num [1:506] 0 0 0 0 0 0 0 0 0 0 ...
$ nox : num [1:506] 0.538 0.469 0.469 0.458 0.458 0.458 0.524 0.524 0.524 0.524 ...
$ rm : num [1:506] 6.58 6.42 7.18 7 7.15 ...
$ age : num [1:506] 65.2 78.9 61.1 45.8 54.2 58.7 66.6 96.1 100 85.9 ...
$ dis : num [1:506] 4.09 4.97 4.97 6.06 6.06 ...
$ rad : num [1:506] 1 2 2 3 3 3 5 5 5 5 ...
$ tax : num [1:506] 296 242 242 222 222 222 311 311 311 311 ...
$ ptratio: num [1:506] 15.3 17.8 17.8 18.7 18.7 18.7 15.2 15.2 15.2 15.2 ...
$ lstat : num [1:506] 4.98 9.14 4.03 2.94 5.33 ...
$ medv : num [1:506] 24 21.6 34.7 33.4 36.2 28.7 22.9 27.1 16.5 18.9 ...
- attr(*, "spec")=
.. cols(
.. ...1 = col_skip(),
.. crim = col_double(),
.. zn = col_double(),
.. indus = col_double(),
.. chas = col_double(),
.. nox = col_double(),
.. rm = col_double(),
.. age = col_double(),
.. dis = col_double(),
.. rad = col_double(),
.. tax = col_double(),
.. ptratio = col_double(),
.. lstat = col_double(),
.. medv = col_double()
.. )
Boston$chas <- as.numeric(Boston$chas)
Boston$rad <- as.numeric(Boston$rad)
pairs(Boston)answer: Not much can be discerned other than the fact that some variables appear to be correlated.
cor(Boston) crim zn indus chas nox
crim 1.00000000 -0.20046922 0.40658341 -0.055891582 0.42097171
zn -0.20046922 1.00000000 -0.53382819 -0.042696719 -0.51660371
indus 0.40658341 -0.53382819 1.00000000 0.062938027 0.76365145
chas -0.05589158 -0.04269672 0.06293803 1.000000000 0.09120281
nox 0.42097171 -0.51660371 0.76365145 0.091202807 1.00000000
rm -0.21924670 0.31199059 -0.39167585 0.091251225 -0.30218819
age 0.35273425 -0.56953734 0.64477851 0.086517774 0.73147010
dis -0.37967009 0.66440822 -0.70802699 -0.099175780 -0.76923011
rad 0.62550515 -0.31194783 0.59512927 -0.007368241 0.61144056
tax 0.58276431 -0.31456332 0.72076018 -0.035586518 0.66802320
ptratio 0.28994558 -0.39167855 0.38324756 -0.121515174 0.18893268
lstat 0.45562148 -0.41299457 0.60379972 -0.053929298 0.59087892
medv -0.38830461 0.36044534 -0.48372516 0.175260177 -0.42732077
rm age dis rad tax ptratio
crim -0.21924670 0.35273425 -0.37967009 0.625505145 0.58276431 0.2899456
zn 0.31199059 -0.56953734 0.66440822 -0.311947826 -0.31456332 -0.3916785
indus -0.39167585 0.64477851 -0.70802699 0.595129275 0.72076018 0.3832476
chas 0.09125123 0.08651777 -0.09917578 -0.007368241 -0.03558652 -0.1215152
nox -0.30218819 0.73147010 -0.76923011 0.611440563 0.66802320 0.1889327
rm 1.00000000 -0.24026493 0.20524621 -0.209846668 -0.29204783 -0.3555015
age -0.24026493 1.00000000 -0.74788054 0.456022452 0.50645559 0.2615150
dis 0.20524621 -0.74788054 1.00000000 -0.494587930 -0.53443158 -0.2324705
rad -0.20984667 0.45602245 -0.49458793 1.000000000 0.91022819 0.4647412
tax -0.29204783 0.50645559 -0.53443158 0.910228189 1.00000000 0.4608530
ptratio -0.35550149 0.26151501 -0.23247054 0.464741179 0.46085304 1.0000000
lstat -0.61380827 0.60233853 -0.49699583 0.488676335 0.54399341 0.3740443
medv 0.69535995 -0.37695457 0.24992873 -0.381626231 -0.46853593 -0.5077867
lstat medv
crim 0.4556215 -0.3883046
zn -0.4129946 0.3604453
indus 0.6037997 -0.4837252
chas -0.0539293 0.1752602
nox 0.5908789 -0.4273208
rm -0.6138083 0.6953599
age 0.6023385 -0.3769546
dis -0.4969958 0.2499287
rad 0.4886763 -0.3816262
tax 0.5439934 -0.4685359
ptratio 0.3740443 -0.5077867
lstat 1.0000000 -0.7376627
medv -0.7376627 1.0000000
answer: The variables that are most correlated with medv are lstat and rm. The variables that are most correlated with lstat are rm and ptratio. The variables that are most correlated with rm are lstat and ptratio.
summary(Boston$crim) Min. 1st Qu. Median Mean 3rd Qu. Max.
0.00632 0.08204 0.25651 3.61352 3.67708 88.97620
summary(Boston$tax) Min. 1st Qu. Median Mean 3rd Qu. Max.
187.0 279.0 330.0 408.2 666.0 711.0
summary(Boston$ptratio) Min. 1st Qu. Median Mean 3rd Qu. Max.
12.60 17.40 19.05 18.46 20.20 22.00
qplot(Boston$crim, binwidth=5 , xlab = "Crime rate", ylab="Number of Suburbs" )Warning: `qplot()` was deprecated in ggplot2 3.4.0.
qplot(Boston$tax, binwidth=50 , xlab = "Full-value property-tax rate per $10,000", ylab="Number of Suburbs")qplot(Boston$ptratio, binwidth=5, xlab ="Pupil-teacher ratio by town", ylab="Number of Suburbs")selection <- subset( Boston, crim > 10)
nrow(selection)/ nrow(Boston)[1] 0.1067194
selection <- subset( Boston, crim > 60)
nrow(selection)/ nrow(Boston)[1] 0.005928854
selection <- subset( Boston, tax > 600)
nrow(selection)/ nrow(Boston)[1] 0.270751
selection <- subset( Boston, tax < 600)
nrow(selection)/ nrow(Boston)[1] 0.729249
selection <- subset( Boston, ptratio > 17.5)
nrow(selection)/ nrow(Boston)[1] 0.715415
selection <- subset( Boston, ptratio < 17.5)
nrow(selection)/ nrow(Boston)[1] 0.284585
answer: For the crime rate, the median is 0.25%, and the maximum is 88.976%, there are some neighborhoods with very high crime rates. And 11% of the neighborhoods have crime rates above 10%, and 0.6% of the neighborhoods have crime rates above 60%. Based on the histogram, there are few place where the tax rate are very high. The median is 330, and the mean is 408.2,the maximum is 711. 27% of the neighborhoods have tax rates above 600. 73% of the neighborhoods have tax rates below 600. Based on the histogram of pupil-teacher ratio, the median is 19.05, and the mean is 18.5, the maximum is 22. . 72% of the neighborhoods have pupil-teacher ratio above 17.5, and 28% of the neighborhoods have pupil-teacher ratio below 17.5.
nrow(subset(Boston, chas ==1)) [1] 35
answer: There are 35 census tracts that bound the Charles River.
summary(Boston$ptratio) Min. 1st Qu. Median Mean 3rd Qu. Max.
12.60 17.40 19.05 18.46 20.20 22.00
answer: The median pupil-teacher ratio among the towns in this data set is 19.05.
selection <- Boston[order(Boston$medv),]
selection[1,]# A tibble: 1 × 13
crim zn indus chas nox rm age dis rad tax ptratio lstat
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 38.4 0 18.1 0 0.693 5.45 100 1.49 24 666 20.2 30.6
# ℹ 1 more variable: medv <dbl>
summary(selection) crim zn indus chas
Min. : 0.00632 Min. : 0.00 Min. : 0.46 Min. :0.00000
1st Qu.: 0.08205 1st Qu.: 0.00 1st Qu.: 5.19 1st Qu.:0.00000
Median : 0.25651 Median : 0.00 Median : 9.69 Median :0.00000
Mean : 3.61352 Mean : 11.36 Mean :11.14 Mean :0.06917
3rd Qu.: 3.67708 3rd Qu.: 12.50 3rd Qu.:18.10 3rd Qu.:0.00000
Max. :88.97620 Max. :100.00 Max. :27.74 Max. :1.00000
nox rm age dis
Min. :0.3850 Min. :3.561 Min. : 2.90 Min. : 1.130
1st Qu.:0.4490 1st Qu.:5.886 1st Qu.: 45.02 1st Qu.: 2.100
Median :0.5380 Median :6.208 Median : 77.50 Median : 3.207
Mean :0.5547 Mean :6.285 Mean : 68.57 Mean : 3.795
3rd Qu.:0.6240 3rd Qu.:6.623 3rd Qu.: 94.08 3rd Qu.: 5.188
Max. :0.8710 Max. :8.780 Max. :100.00 Max. :12.127
rad tax ptratio lstat
Min. : 1.000 Min. :187.0 Min. :12.60 Min. : 1.73
1st Qu.: 4.000 1st Qu.:279.0 1st Qu.:17.40 1st Qu.: 6.95
Median : 5.000 Median :330.0 Median :19.05 Median :11.36
Mean : 9.549 Mean :408.2 Mean :18.46 Mean :12.65
3rd Qu.:24.000 3rd Qu.:666.0 3rd Qu.:20.20 3rd Qu.:16.95
Max. :24.000 Max. :711.0 Max. :22.00 Max. :37.97
medv
Min. : 5.00
1st Qu.:17.02
Median :21.20
Mean :22.53
3rd Qu.:25.00
Max. :50.00
answer: The lowest median value of owner- occupied homes is 5. This census tract has several distinctive characteristics compare to overall range. The crime rate is very high at 38.3518, far exceeding the median range, this area has an usually high crime rate.The tract has no residential land zoned for large lots , is entirely industrial , and has an older housing stock , indicating it is a fully developed, non-residential area. The nox is very high because of the industrial pollution.The average number of rooms is lower at 5.453, and the distance to employment centers is quite close at 1.4896. The accessibility to radial highways is the highest at 24, and the property-tax rate is very high at 666. The pupil-teacher ratio is also on the higher end at 20.2, and the proportion of African Americans is near the maximum at 396.9. Lastly, the lower status of the population is very high at 30.59, and the median value of owner-occupied homes is very low at 5. These factors collectively suggest an urban, high-crime, industrial area with economic challenges.
rm_over_7 <- subset(Boston, rm>7)
nrow(rm_over_7) [1] 64
rm_over_8 <- subset(Boston, rm>8)
nrow(rm_over_8) [1] 13
summary(rm_over_8) crim zn indus chas
Min. :0.02009 Min. : 0.00 Min. : 2.680 Min. :0.0000
1st Qu.:0.33147 1st Qu.: 0.00 1st Qu.: 3.970 1st Qu.:0.0000
Median :0.52014 Median : 0.00 Median : 6.200 Median :0.0000
Mean :0.71879 Mean :13.62 Mean : 7.078 Mean :0.1538
3rd Qu.:0.57834 3rd Qu.:20.00 3rd Qu.: 6.200 3rd Qu.:0.0000
Max. :3.47428 Max. :95.00 Max. :19.580 Max. :1.0000
nox rm age dis
Min. :0.4161 Min. :8.034 Min. : 8.40 Min. :1.801
1st Qu.:0.5040 1st Qu.:8.247 1st Qu.:70.40 1st Qu.:2.288
Median :0.5070 Median :8.297 Median :78.30 Median :2.894
Mean :0.5392 Mean :8.349 Mean :71.54 Mean :3.430
3rd Qu.:0.6050 3rd Qu.:8.398 3rd Qu.:86.50 3rd Qu.:3.652
Max. :0.7180 Max. :8.780 Max. :93.90 Max. :8.907
rad tax ptratio lstat medv
Min. : 2.000 Min. :224.0 Min. :13.00 Min. :2.47 Min. :21.9
1st Qu.: 5.000 1st Qu.:264.0 1st Qu.:14.70 1st Qu.:3.32 1st Qu.:41.7
Median : 7.000 Median :307.0 Median :17.40 Median :4.14 Median :48.3
Mean : 7.462 Mean :325.1 Mean :16.36 Mean :4.31 Mean :44.2
3rd Qu.: 8.000 3rd Qu.:307.0 3rd Qu.:17.40 3rd Qu.:5.12 3rd Qu.:50.0
Max. :24.000 Max. :666.0 Max. :20.20 Max. :7.44 Max. :50.0
answer: There are 64 neighborhoods with more than 7 rooms, and 13 neighborhoods with more than 8 rooms. The median value of homes in neighborhoods with more than 8 rooms is 45,000 dollars higher than the median value of homes in neighborhoods with more than 7 rooms. The mean value of homes in neighborhoods with more than 8 rooms is 50,000 dollars higher than the mean value of homes in neighborhoods with more than 7 rooms.
5 ISL Exercise 3.7.3 (12pts)
- Salary=50+20*(GPA)+0.07⋅(IQ)+35⋅(Level)+0.01⋅(GPA×IQ)−10⋅(GPA×Level)
i.Salary = 50 + 20 * GPA + 0.07 * IQ + 35 * College + 0.01 * GPA * IQ - 10 * GPA * College We now can estimate that High school earn an average of 50 + 20 * mean(GPA) + 0.07 * mean(IQ) + 0.01 * mean(GPA) * mean(IQ) and College earn an average of 50 + 20 * mean(GPA) + 0.07 * mean(IQ) + 35 + 0.01 * mean(GPA) * mean(IQ) - 10 * mean(GPA). When you subtract out the common terms, you find out that College earn an average of 35 - 10 * mean(GPA) more than High School. Since we don’t know the value of mean(GPA), we don’t know whether High school are outearning College on average or not.
ii.That’s also uncertain.
iii.Since College earn an average of 35 - 10 * mean(GPA) more than High School, a higher GPA means College earn less than High school. So this is true statement.
iv.Since College earn an average of 35 - 10 * mean(GPA) more than High School, a higher GPA means College earn less than High school. So this is false statement.
answer: The answer is iii.
We estimate that College earn an average of 50+20⋅(GPA)+0.07⋅(IQ)+35⋅(1)+0.01⋅(GPA×IQ)−10⋅(GPA×1). When we add IQ as 110 and GPA as 4.0.we will get 50+20⋅4.0+0.07⋅110+35⋅(1)+0.01⋅110×4.0−10⋅(4.0×1) = 137.1 answer: The estimate salary of a college graduate with IQ of 110 and a GPA of 4.0 is $137100.
answer: False, the magnitude of the coefficient is not an indicator of statistical significance.
6 ISL Exercise 3.7.15 (20pts)
- Model:Y(crim) = β0 + β1 (zn)X
data = Boston
boston.zn <- lm(crim ~ zn, data=Boston)
summary(boston.zn)
Call:
lm(formula = crim ~ zn, data = Boston)
Residuals:
Min 1Q Median 3Q Max
-4.429 -4.222 -2.620 1.250 84.523
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 4.45369 0.41722 10.675 < 2e-16 ***
zn -0.07393 0.01609 -4.594 5.51e-06 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 8.435 on 504 degrees of freedom
Multiple R-squared: 0.04019, Adjusted R-squared: 0.03828
F-statistic: 21.1 on 1 and 504 DF, p-value: 5.506e-06
par(mfrow = c(2, 2))
plot(boston.zn)we can see that F-statistic is 21.1 and p-value is < 5.506e-06,meaning the chance of having a null hypothesis (β0) is very low. There is a statistically significant association between crim and zn.
Model:Y(crim) = β0 + β1 (indus)X
boston.indus <- lm(crim ~ indus, data=Boston)
summary(boston.indus)
Call:
lm(formula = crim ~ indus, data = Boston)
Residuals:
Min 1Q Median 3Q Max
-11.972 -2.698 -0.736 0.712 81.813
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -2.06374 0.66723 -3.093 0.00209 **
indus 0.50978 0.05102 9.991 < 2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 7.866 on 504 degrees of freedom
Multiple R-squared: 0.1653, Adjusted R-squared: 0.1637
F-statistic: 99.82 on 1 and 504 DF, p-value: < 2.2e-16
par(mfrow = c(2, 2))
plot(boston.indus)we can see that F-statistic is 99.82 and p-value is < 2.2e-16,meaning the chance of having a null hypothesis (β0) is very low. There is a statistically significant association between crim and indus.
Model:Y(crim) = β0 + β1 (chas)X
boston.chas <- lm(crim ~ chas, data=Boston)
summary(boston.chas)
Call:
lm(formula = crim ~ chas, data = Boston)
Residuals:
Min 1Q Median 3Q Max
-3.738 -3.661 -3.435 0.018 85.232
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 3.7444 0.3961 9.453 <2e-16 ***
chas -1.8928 1.5061 -1.257 0.209
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 8.597 on 504 degrees of freedom
Multiple R-squared: 0.003124, Adjusted R-squared: 0.001146
F-statistic: 1.579 on 1 and 504 DF, p-value: 0.2094
par(mfrow = c(2, 2))
plot(boston.chas)we can see that F-statistic is 1.579 and p-value is 0.2094. There is not a statistically significant association between crim and chas.
Model:Y(crim) = β0 + β1 (nox)X
boston.nox <- lm(crim ~ nox, data=Boston)
summary(boston.nox)
Call:
lm(formula = crim ~ nox, data = Boston)
Residuals:
Min 1Q Median 3Q Max
-12.371 -2.738 -0.974 0.559 81.728
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -13.720 1.699 -8.073 5.08e-15 ***
nox 31.249 2.999 10.419 < 2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 7.81 on 504 degrees of freedom
Multiple R-squared: 0.1772, Adjusted R-squared: 0.1756
F-statistic: 108.6 on 1 and 504 DF, p-value: < 2.2e-16
par(mfrow = c(2, 2))
plot(boston.nox)we can see that F-statistic is 108.6 and p-value is < 2.2e-16,meaning the chance of having a null hypothesis (β0) is very low. There is a statistically significant association between crim and nox.
Model:Y(crim) = β0 + β1 (rm)X
boston.rm <- lm(crim ~ rm, data=Boston)
summary(boston.rm)
Call:
lm(formula = crim ~ rm, data = Boston)
Residuals:
Min 1Q Median 3Q Max
-6.604 -3.952 -2.654 0.989 87.197
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 20.482 3.365 6.088 2.27e-09 ***
rm -2.684 0.532 -5.045 6.35e-07 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 8.401 on 504 degrees of freedom
Multiple R-squared: 0.04807, Adjusted R-squared: 0.04618
F-statistic: 25.45 on 1 and 504 DF, p-value: 6.347e-07
par(mfrow = c(2, 2))
plot(boston.rm)we can see that F-statistic is 25.45 and p-value is 6.347e-07,meaning the chance of having a null hypothesis (β0) is very low. There is a statistically significant association between crim and rm.
Model:Y(crim) = β0 + β1 (age)X
boston.age <- lm(crim ~ age, data=Boston)
summary(boston.age)
Call:
lm(formula = crim ~ age, data = Boston)
Residuals:
Min 1Q Median 3Q Max
-6.789 -4.257 -1.230 1.527 82.849
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -3.77791 0.94398 -4.002 7.22e-05 ***
age 0.10779 0.01274 8.463 2.85e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 8.057 on 504 degrees of freedom
Multiple R-squared: 0.1244, Adjusted R-squared: 0.1227
F-statistic: 71.62 on 1 and 504 DF, p-value: 2.855e-16
par(mfrow = c(2, 2))
plot(boston.age)we can see that F-statistic is 71.62 and p-value is 2.855e-16,meaning the chance of having a null hypothesis (β0) is very low. There is a statistically significant association between crim and age.
Model:Y(crim) = β0 + β1 (dis)X
boston.dis <- lm(crim ~ dis, data=Boston)
summary(boston.dis)
Call:
lm(formula = crim ~ dis, data = Boston)
Residuals:
Min 1Q Median 3Q Max
-6.708 -4.134 -1.527 1.516 81.674
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 9.4993 0.7304 13.006 <2e-16 ***
dis -1.5509 0.1683 -9.213 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 7.965 on 504 degrees of freedom
Multiple R-squared: 0.1441, Adjusted R-squared: 0.1425
F-statistic: 84.89 on 1 and 504 DF, p-value: < 2.2e-16
par(mfrow = c(2, 2))
plot(boston.dis)we can see that F-statistic is 84.89 and p-value is 2.2e-16,meaning the chance of having a null hypothesis (β0) is very low. There is a statistically significant association between crim and dis.
Model:Y(crim) = β0 + β1 (rad)X
boston.rad <- lm(crim ~ rad, data=Boston)
summary(boston.rad)
Call:
lm(formula = crim ~ rad, data = Boston)
Residuals:
Min 1Q Median 3Q Max
-10.164 -1.381 -0.141 0.660 76.433
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -2.28716 0.44348 -5.157 3.61e-07 ***
rad 0.61791 0.03433 17.998 < 2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 6.718 on 504 degrees of freedom
Multiple R-squared: 0.3913, Adjusted R-squared: 0.39
F-statistic: 323.9 on 1 and 504 DF, p-value: < 2.2e-16
par(mfrow = c(2, 2))
plot(boston.rad)we can see that F-statistic is 323.9 and p-value < 2.2e-16,meaning the chance of having a null hypothesis (β0) is very low. There is a statistically significant association between crim and red.
Model:Y(crim) = β0 + β1 (tax)X
boston.tax <- lm(crim ~ tax, data=Boston)
summary(boston.tax)
Call:
lm(formula = crim ~ tax, data = Boston)
Residuals:
Min 1Q Median 3Q Max
-12.513 -2.738 -0.194 1.065 77.696
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -8.528369 0.815809 -10.45 <2e-16 ***
tax 0.029742 0.001847 16.10 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 6.997 on 504 degrees of freedom
Multiple R-squared: 0.3396, Adjusted R-squared: 0.3383
F-statistic: 259.2 on 1 and 504 DF, p-value: < 2.2e-16
par(mfrow = c(2, 2))
plot(boston.tax)we can see that F-statistic is 259.2 and p-value < 2.2e-16,meaning the chance of having a null hypothesis (β0) is very low. There is a statistically significant association between crim and tax.
Model:Y(crim) = β0 + β1 (ptratio)X
boston.ptratio <- lm(crim ~ ptratio, data=Boston)
summary(boston.ptratio)
Call:
lm(formula = crim ~ ptratio, data = Boston)
Residuals:
Min 1Q Median 3Q Max
-7.654 -3.985 -1.912 1.825 83.353
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -17.6469 3.1473 -5.607 3.40e-08 ***
ptratio 1.1520 0.1694 6.801 2.94e-11 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 8.24 on 504 degrees of freedom
Multiple R-squared: 0.08407, Adjusted R-squared: 0.08225
F-statistic: 46.26 on 1 and 504 DF, p-value: 2.943e-11
par(mfrow = c(2, 2))
plot(boston.ptratio)we can see that F-statistic is 46.26 and p-value is 2.943e-11,meaning the chance of having a null hypothesis (β0) is very low. There is a statistically significant association between crim and ptratio.
Model:Y(crim) = β0 + β1 (lstat)X
boston.lstat <- lm(crim ~ lstat, data=Boston)
summary(boston.lstat)
Call:
lm(formula = crim ~ lstat, data = Boston)
Residuals:
Min 1Q Median 3Q Max
-13.925 -2.822 -0.664 1.079 82.862
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -3.33054 0.69376 -4.801 2.09e-06 ***
lstat 0.54880 0.04776 11.491 < 2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 7.664 on 504 degrees of freedom
Multiple R-squared: 0.2076, Adjusted R-squared: 0.206
F-statistic: 132 on 1 and 504 DF, p-value: < 2.2e-16
par(mfrow = c(2, 2))
plot(boston.lstat)we can see that F-statistic is 132 and p-value < 2.2e-16,meaning the chance of having a null hypothesis (β0) is very low. There is a statistically significant association between crim and lstat.
Model:Y(crim) = β0 + β1 (medv)X
boston.medv <- lm(crim ~ medv, data=Boston)
summary(boston.medv)
Call:
lm(formula = crim ~ medv, data = Boston)
Residuals:
Min 1Q Median 3Q Max
-9.071 -4.022 -2.343 1.298 80.957
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 11.79654 0.93419 12.63 <2e-16 ***
medv -0.36316 0.03839 -9.46 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 7.934 on 504 degrees of freedom
Multiple R-squared: 0.1508, Adjusted R-squared: 0.1491
F-statistic: 89.49 on 1 and 504 DF, p-value: < 2.2e-16
par(mfrow = c(2, 2))
plot(boston.medv)we can see that F-statistic is 89.49 and p-value < 2.2e-16,meaning the chance of having a null hypothesis (β0) is very low. There is a statistically significant association between crim and medv.
lm.all <- lm(crim ~.,data = Boston)
summary(lm.all)
Call:
lm(formula = crim ~ ., data = Boston)
Residuals:
Min 1Q Median 3Q Max
-8.534 -2.248 -0.348 1.087 73.923
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 13.7783938 7.0818258 1.946 0.052271 .
zn 0.0457100 0.0187903 2.433 0.015344 *
indus -0.0583501 0.0836351 -0.698 0.485709
chas -0.8253776 1.1833963 -0.697 0.485841
nox -9.9575865 5.2898242 -1.882 0.060370 .
rm 0.6289107 0.6070924 1.036 0.300738
age -0.0008483 0.0179482 -0.047 0.962323
dis -1.0122467 0.2824676 -3.584 0.000373 ***
rad 0.6124653 0.0875358 6.997 8.59e-12 ***
tax -0.0037756 0.0051723 -0.730 0.465757
ptratio -0.3040728 0.1863598 -1.632 0.103393
lstat 0.1388006 0.0757213 1.833 0.067398 .
medv -0.2200564 0.0598240 -3.678 0.000261 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 6.46 on 493 degrees of freedom
Multiple R-squared: 0.4493, Adjusted R-squared: 0.4359
F-statistic: 33.52 on 12 and 493 DF, p-value: < 2.2e-16
answer: The predictors that appear to be statistically significant are zn, dis, rad, medv have a significant association with crim (p-value is below 0.05) which means we can reject the null hypothesis.
x = c(coefficients(boston.zn)[2],
coefficients(boston.indus)[2],
coefficients(boston.chas)[2],
coefficients(boston.nox)[2],
coefficients(boston.rm)[2],
coefficients(boston.age)[2],
coefficients(boston.dis)[2],
coefficients(boston.rad)[2],
coefficients(boston.tax)[2],
coefficients(boston.ptratio)[2],
coefficients(boston.lstat)[2],
coefficients(boston.medv)[2])
y = coefficients(lm.all)[2:13]
plot(x, y, col = "blue",pch =19, ylab = "multiple regression coefficients",
xlab = "Univariate Regression coefficients",
main = "Relationship between Multiple regression \n and univariate regression coefficients")- Model: crim=β0+β1(zn)+β2(zn)2+β3(zn)3+ϵ
boston.poly.zn = lm(crim ~ zn + I(zn^2) + I(zn^3), data = Boston)
summary(boston.poly.zn)
Call:
lm(formula = crim ~ zn + I(zn^2) + I(zn^3), data = Boston)
Residuals:
Min 1Q Median 3Q Max
-4.821 -4.614 -1.294 0.473 84.130
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 4.846e+00 4.330e-01 11.192 < 2e-16 ***
zn -3.322e-01 1.098e-01 -3.025 0.00261 **
I(zn^2) 6.483e-03 3.861e-03 1.679 0.09375 .
I(zn^3) -3.776e-05 3.139e-05 -1.203 0.22954
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 8.372 on 502 degrees of freedom
Multiple R-squared: 0.05824, Adjusted R-squared: 0.05261
F-statistic: 10.35 on 3 and 502 DF, p-value: 1.281e-06
Model: crim=β0+β1(indus)+β2(indus)2+β3(indus)3+ϵ
boston.poly.indus = lm(crim ~ indus + I(indus^2) + I(indus^3), data = Boston)
summary(boston.poly.indus)
Call:
lm(formula = crim ~ indus + I(indus^2) + I(indus^3), data = Boston)
Residuals:
Min 1Q Median 3Q Max
-8.278 -2.514 0.054 0.764 79.713
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 3.6625683 1.5739833 2.327 0.0204 *
indus -1.9652129 0.4819901 -4.077 5.30e-05 ***
I(indus^2) 0.2519373 0.0393221 6.407 3.42e-10 ***
I(indus^3) -0.0069760 0.0009567 -7.292 1.20e-12 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 7.423 on 502 degrees of freedom
Multiple R-squared: 0.2597, Adjusted R-squared: 0.2552
F-statistic: 58.69 on 3 and 502 DF, p-value: < 2.2e-16
Model: crim=β0+β1(chas)+β2(chas)2+β3(chas)3+ϵ
boston.poly.chas = lm(crim ~ + I(chas^2) + I(chas^3), data = Boston)
summary(boston.poly.chas)
Call:
lm(formula = crim ~ +I(chas^2) + I(chas^3), data = Boston)
Residuals:
Min 1Q Median 3Q Max
-3.738 -3.661 -3.435 0.018 85.232
Coefficients: (1 not defined because of singularities)
Estimate Std. Error t value Pr(>|t|)
(Intercept) 3.7444 0.3961 9.453 <2e-16 ***
I(chas^2) -1.8928 1.5061 -1.257 0.209
I(chas^3) NA NA NA NA
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 8.597 on 504 degrees of freedom
Multiple R-squared: 0.003124, Adjusted R-squared: 0.001146
F-statistic: 1.579 on 1 and 504 DF, p-value: 0.2094
Model: crim=β0+β1(nox)+β2(nox)2+β3(nox)3+ϵ
boston.poly.nox = lm(crim ~ nox + I(nox^2) + I(nox^3), data = Boston)
summary(boston.poly.nox)
Call:
lm(formula = crim ~ nox + I(nox^2) + I(nox^3), data = Boston)
Residuals:
Min 1Q Median 3Q Max
-9.110 -2.068 -0.255 0.739 78.302
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 233.09 33.64 6.928 1.31e-11 ***
nox -1279.37 170.40 -7.508 2.76e-13 ***
I(nox^2) 2248.54 279.90 8.033 6.81e-15 ***
I(nox^3) -1245.70 149.28 -8.345 6.96e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 7.234 on 502 degrees of freedom
Multiple R-squared: 0.297, Adjusted R-squared: 0.2928
F-statistic: 70.69 on 3 and 502 DF, p-value: < 2.2e-16
Model: crim=β0+β1(rm)+β2(rm)2+β3(rm)3+ϵ
boston.poly.rm = lm(crim ~ rm + I(rm^2) + I(rm^3), data = Boston)
summary(boston.poly.rm)
Call:
lm(formula = crim ~ rm + I(rm^2) + I(rm^3), data = Boston)
Residuals:
Min 1Q Median 3Q Max
-18.485 -3.468 -2.221 -0.015 87.219
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 112.6246 64.5172 1.746 0.0815 .
rm -39.1501 31.3115 -1.250 0.2118
I(rm^2) 4.5509 5.0099 0.908 0.3641
I(rm^3) -0.1745 0.2637 -0.662 0.5086
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 8.33 on 502 degrees of freedom
Multiple R-squared: 0.06779, Adjusted R-squared: 0.06222
F-statistic: 12.17 on 3 and 502 DF, p-value: 1.067e-07
Model: crim=β0+β1(age)+β2(age)2+β3(age)3+ϵ
boston.poly.age = lm(crim ~ age + I(age^2) + I(age^3), data = Boston)
summary(boston.poly.age)
Call:
lm(formula = crim ~ age + I(age^2) + I(age^3), data = Boston)
Residuals:
Min 1Q Median 3Q Max
-9.762 -2.673 -0.516 0.019 82.842
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -2.549e+00 2.769e+00 -0.920 0.35780
age 2.737e-01 1.864e-01 1.468 0.14266
I(age^2) -7.230e-03 3.637e-03 -1.988 0.04738 *
I(age^3) 5.745e-05 2.109e-05 2.724 0.00668 **
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 7.84 on 502 degrees of freedom
Multiple R-squared: 0.1742, Adjusted R-squared: 0.1693
F-statistic: 35.31 on 3 and 502 DF, p-value: < 2.2e-16
Model: crim=β0+β1(dis)+β2(dis)2+β3(dis)3+ϵ
boston.poly.dis = lm(crim ~ dis + I(dis^2) + I(dis^3), data = Boston)
summary(boston.poly.dis)
Call:
lm(formula = crim ~ dis + I(dis^2) + I(dis^3), data = Boston)
Residuals:
Min 1Q Median 3Q Max
-10.757 -2.588 0.031 1.267 76.378
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 30.0476 2.4459 12.285 < 2e-16 ***
dis -15.5543 1.7360 -8.960 < 2e-16 ***
I(dis^2) 2.4521 0.3464 7.078 4.94e-12 ***
I(dis^3) -0.1186 0.0204 -5.814 1.09e-08 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 7.331 on 502 degrees of freedom
Multiple R-squared: 0.2778, Adjusted R-squared: 0.2735
F-statistic: 64.37 on 3 and 502 DF, p-value: < 2.2e-16
Model: crim=β0+β1(rad)+β2(rad)2+β3(rad)3+ϵ
boston.poly.rad = lm(crim ~ rad + I(rad^2) + I(rad^3), data = Boston)
summary(boston.poly.rad)
Call:
lm(formula = crim ~ rad + I(rad^2) + I(rad^3), data = Boston)
Residuals:
Min 1Q Median 3Q Max
-10.381 -0.412 -0.269 0.179 76.217
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.605545 2.050108 -0.295 0.768
rad 0.512736 1.043597 0.491 0.623
I(rad^2) -0.075177 0.148543 -0.506 0.613
I(rad^3) 0.003209 0.004564 0.703 0.482
Residual standard error: 6.682 on 502 degrees of freedom
Multiple R-squared: 0.4, Adjusted R-squared: 0.3965
F-statistic: 111.6 on 3 and 502 DF, p-value: < 2.2e-16
Model: crim=β0+β1(tax)+β2(tax)2+β3(tax)3+ϵ
boston.poly.tax = lm(crim ~ tax + I(tax^2) + I(tax^3), data = Boston)
summary(boston.poly.tax)
Call:
lm(formula = crim ~ tax + I(tax^2) + I(tax^3), data = Boston)
Residuals:
Min 1Q Median 3Q Max
-13.273 -1.389 0.046 0.536 76.950
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.918e+01 1.180e+01 1.626 0.105
tax -1.533e-01 9.568e-02 -1.602 0.110
I(tax^2) 3.608e-04 2.425e-04 1.488 0.137
I(tax^3) -2.204e-07 1.889e-07 -1.167 0.244
Residual standard error: 6.854 on 502 degrees of freedom
Multiple R-squared: 0.3689, Adjusted R-squared: 0.3651
F-statistic: 97.8 on 3 and 502 DF, p-value: < 2.2e-16
Model: crim=β0+β1(ptratio)+β2(ptratio)2+β3(ptratio)3+ϵ
boston.poly.ptratio = lm(crim ~ ptratio + I(ptratio^2) + I(ptratio^3), data = Boston)
summary(boston.poly.ptratio)
Call:
lm(formula = crim ~ ptratio + I(ptratio^2) + I(ptratio^3), data = Boston)
Residuals:
Min 1Q Median 3Q Max
-6.833 -4.146 -1.655 1.408 82.697
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 477.18405 156.79498 3.043 0.00246 **
ptratio -82.36054 27.64394 -2.979 0.00303 **
I(ptratio^2) 4.63535 1.60832 2.882 0.00412 **
I(ptratio^3) -0.08476 0.03090 -2.743 0.00630 **
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 8.122 on 502 degrees of freedom
Multiple R-squared: 0.1138, Adjusted R-squared: 0.1085
F-statistic: 21.48 on 3 and 502 DF, p-value: 4.171e-13
Model: crim=β0+β1(lstat)+β2(lstat)2+β3(lstat)3+ϵ
boston.poly.lstat = lm(crim ~ lstat + I(lstat^2) + I(lstat^3), data = Boston)
summary(boston.poly.lstat)
Call:
lm(formula = crim ~ lstat + I(lstat^2) + I(lstat^3), data = Boston)
Residuals:
Min 1Q Median 3Q Max
-15.234 -2.151 -0.486 0.066 83.353
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.2009656 2.0286452 0.592 0.5541
lstat -0.4490656 0.4648911 -0.966 0.3345
I(lstat^2) 0.0557794 0.0301156 1.852 0.0646 .
I(lstat^3) -0.0008574 0.0005652 -1.517 0.1299
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 7.629 on 502 degrees of freedom
Multiple R-squared: 0.2179, Adjusted R-squared: 0.2133
F-statistic: 46.63 on 3 and 502 DF, p-value: < 2.2e-16
Model: crim=β0+β1(medv)+β2(medv)2+β3(medv)3+ϵ
boston.poly.medv = lm(crim ~ medv + I(medv^2) + I(medv^3), data = Boston)
summary(boston.poly.medv)
Call:
lm(formula = crim ~ medv + I(medv^2) + I(medv^3), data = Boston)
Residuals:
Min 1Q Median 3Q Max
-24.427 -1.976 -0.437 0.439 73.655
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 53.1655381 3.3563105 15.840 < 2e-16 ***
medv -5.0948305 0.4338321 -11.744 < 2e-16 ***
I(medv^2) 0.1554965 0.0171904 9.046 < 2e-16 ***
I(medv^3) -0.0014901 0.0002038 -7.312 1.05e-12 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 6.569 on 502 degrees of freedom
Multiple R-squared: 0.4202, Adjusted R-squared: 0.4167
F-statistic: 121.3 on 3 and 502 DF, p-value: < 2.2e-16
answer :Each of these variables [indus, nox, dis, ptratio, medv] squared and cubed terms are statistically significant (p-value is below 0.05) which means we can reject the null hypothesis. Age seems like have non-linear relationship with crim, but the squared and cubed terms are not statistically significant (p-value is above 0.05) which means we can not reject the null hypothesis. For other variable, there are no evidence to support the non-linear relationship between crim and other variables.
7 Bonus question (20pts)
For multiple linear regression, show that \(R^2\) is equal to the correlation between the response vector \(\mathbf{y} = (y_1, \ldots, y_n)^T\) and the fitted values \(\hat{\mathbf{y}} = (\hat y_1, \ldots, \hat y_n)^T\). That is \[ R^2 = 1 - \frac{\text{RSS}}{\text{TSS}} = [\operatorname{Cor}(\mathbf{y}, \hat{\mathbf{y}})]^2. \] answer : Recall that \[ Cor(x,y) = \frac{\sum_{i}(x_i - \overline{x})(y_i - \overline{y})} {\sqrt{\sum_{i}(x_i - \overline{x})^2 \sum_{i}(y_i - \overline{y})^2}} = \frac{(x - \frac{1}{n}J_n)^T(y - \frac{1}{n}J_n)} {\sqrt{(x - \frac{1}{n}J_n)^T(x - \frac{1}{n}J_n)(y - \frac{1}{n}J_n)^T(y - \frac{1}{n}J_n)}} \] where \[ J = 11^T = \begin{bmatrix} 1 & 1 & \dots & 1 \\ 1 & 1 & \dots & 1 \\ \vdots & \vdots & \ddots & \vdots \\ 1 & 1 & \dots & 1 \end{bmatrix}_{n \times n} \] And for R^2 \[ R^2 = 1 - \frac{RSS}{TSS} = 1 - \frac{\sum_{i}(y_i - \hat{y}_i)^2}{\sum_{i}(y_i - \bar{y})^2} \] And \[ R^2 = 1 - \frac{\sum_{i}(y_i - \hat{y}_i)^2}{\sum_{i}(y_i - \bar{y})^2} = 1 - \frac{(Y - HY)^T(Y - HY)}{(Y - \frac{1}{n}J_nY)^T(Y - \frac{1}{n}J_nY)} = 1 - \frac{Y^T(I - H)^T(I - H)Y}{Y^T(I - \frac{1}{n}J_n)^T(I - \frac{1}{n}J_n)Y} \] Therefore, \[ R^2 = 1 - \frac{Y^T(I - H)Y}{Y^T(I - \frac{1}{n}J)Y} \] where H = X(XTX){-1}X^T is a projection matrix and so as I - H. Now we have \[ Cor(Y, \hat{Y})^2 = \frac{(Y - \frac{1}{n}JY)^T(\hat{Y} - \frac{1}{n}J\hat{Y})} {[(Y - \frac{1}{n}JY)^T(Y - \frac{1}{n}JY)][(\hat{Y} - \frac{1}{n}J\hat{Y})^T(\hat{Y} - \frac{1}{n}J\hat{Y})]} \] where \[ \hat{Y} = X\beta = X(X^TX)^{-1}X^TY = HY \]